Not the musician, the pipeline!
TRANSLATE PYTHON TO R
Translated Pre-filter Step, Benchmarked Results
RfiddleR::run_fiddle(
data_file = "C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/ehrBOTreformat.csv",
population_file = "C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/patients.csv",
config_file = "C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/config-default.yaml",
output_dir = "C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/fiddle_output_ffill",
T_ = 52,
dt = 1,
theta_1 = .001,
theta_2 = .001,
impute_method = "ffill",
) [1] "Input:"
[2] " Data : C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/ehrBOTreformat.csv"
[3] " Population: C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/patients.csv"
[4] " Config : C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/config-default.yaml"
[5] ""
[6] "Output directory: C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/fiddle_output_ffill"
[7] ""
[8] "Input arguments:"
[9] " T = 52.0"
[10] " dt = 1.0"
[11] " theta_1 = 0.001"
[12] " theta_2 = 0.001"
[13] " theta_freq = 1.0"
[14] " k = 1 ['mean']"
[15] " impute_method = ffill"
[16] ""
[17] "discretize = no"
[18] ""
[19] "N = 10000"
[20] "L = 52"
[21] ""
[22] ""
[23] "================================================================================"
[24] "1) Pre-filter"
[25] "================================================================================"
[26] "Remove rows not in population"
[27] "Remove rows with t outside of [0, 52]"
[28] "Remove rare variables (<= 0.001)"
[29] "Total variables : 42"
[30] "Rare variables : 0"
[31] "Remaining variables : 42"
[32] "# rows (original) : 3129266"
[33] "# rows (filtered) : 3129266"
[34] ""
[35] "================================================================================"
[36] "2) Transform; 3) Post-filter"
[37] "================================================================================"
[38] ""
[39] "--------------------------------------------------------------------------------"
[40] "*) Detecting and parsing value types"
[41] "--------------------------------------------------------------------------------"
[42] "Parsing hierarchical values"
[43] "Saved as: C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/fiddle_output_ffillvalue_types.csv"
[44] ""
[45] "--------------------------------------------------------------------------------"
[46] "*) Separate time-invariant and time-dependent"
[47] "--------------------------------------------------------------------------------"
[48] "Variables (time-invariant): 6"
[49] "Variables (time-dependent): 36"
[50] "# rows (time-invariant): 60000"
[51] "# rows (time-dependent): 3069266"
[52] ""
[53] "--------------------------------------------------------------------------------"
[54] "2-A) Transform time-invariant data"
[55] "--------------------------------------------------------------------------------"
[56] "(N x ^d) table :\t (10000, 6)"
[57] "number of missing entries :\t 0 out of 60000 total"
[58] "Time elapsed: 0.034801 seconds"
[59] ""
[60] "Output"
[61] "S_all, binary features :\t (10000, 8103)"
[62] "Time elapsed: 1.310230 seconds"
[63] ""
[64] "--------------------------------------------------------------------------------"
[65] "3-A) Post-filter time-invariant data"
[66] "--------------------------------------------------------------------------------"
[67] "Original : 8103"
[68] "Nearly-constant: 8086"
[69] "Correlated : 1"
[70] "Time elapsed: 1.310230 seconds"
[71] ""
[72] "Output"
[73] "S: shape=(10000, 16), density=0.282"
[74] "Total time: 1.325858 seconds"
[75] ""
[76] ""
[77] "--------------------------------------------------------------------------------"
[78] "2-B) Transform time-dependent data"
[79] "--------------------------------------------------------------------------------"
[80] "Total variables : 36"
[81] "Frequent variables : []"
[82] "M1 = 0"
[83] "M2 = 36"
[84] "k = 1 ['mean']"
[85] ""
[86] "Transforming each example..."
[87] "Batches of size 100: 100"
[88] "\r 0%| | 0/100 [00:00<?, ?it/s]\r 1%| | 1/100 [00:09<15:00, 9.10s/it]\r 2%|▏ | 2/100 [00:09<06:22, 3.91s/it]\r 3%|▎ | 3/100 [00:09<03:50, 2.38s/it]\r 4%|▍ | 4/100 [00:10<02:28, 1.54s/it]\r 5%|▌ | 5/100 [00:10<01:53, 1.20s/it]\r 6%|▌ | 6/100 [00:11<01:26, 1.09it/s]\r 7%|▋ | 7/100 [00:11<01:06, 1.39it/s]\r 8%|▊ | 8/100 [00:11<00:50, 1.84it/s]\r 10%|█ | 10/100 [00:11<00:28, 3.18it/s]\r 12%|█▏ | 12/100 [00:11<00:19, 4.45it/s]\r 13%|█▎ | 13/100 [00:12<00:18, 4.63it/s]\r 15%|█▌ | 15/100 [00:12<00:13, 6.30it/s]\r 16%|█▌ | 16/100 [00:12<00:20, 4.13it/s]\r 18%|█▊ | 18/100 [00:12<00:14, 5.55it/s]\r 20%|██ | 20/100 [00:13<00:12, 6.59it/s]\r 23%|██▎ | 23/100 [00:13<00:09, 7.83it/s]\r 25%|██▌ | 25/100 [00:13<00:08, 9.11it/s]\r 27%|██▋ | 27/100 [00:13<00:07, 9.94it/s]\r 30%|███ | 30/100 [00:13<00:05, 12.89it/s]\r 33%|███▎ | 33/100 [00:13<00:04, 16.04it/s]\r 37%|███▋ | 37/100 [00:14<00:04, 14.64it/s]\r 41%|████ | 41/100 [00:14<00:03, 17.53it/s]\r 44%|████▍ | 44/100 [00:14<00:03, 15.30it/s]\r 47%|████▋ | 47/100 [00:14<00:03, 16.98it/s]\r 49%|████▉ | 49/100 [00:15<00:04, 11.81it/s]\r 51%|█████ | 51/100 [00:15<00:03, 12.58it/s]\r 54%|█████▍ | 54/100 [00:15<00:03, 14.11it/s]\r 57%|█████▋ | 57/100 [00:15<00:02, 16.77it/s]\r 59%|█████▉ | 59/100 [00:15<00:02, 16.35it/s]\r 62%|██████▏ | 62/100 [00:15<00:02, 18.89it/s]\r 65%|██████▌ | 65/100 [00:15<00:01, 18.61it/s]\r 68%|██████▊ | 68/100 [00:16<00:01, 16.53it/s]\r 70%|███████ | 70/100 [00:16<00:02, 13.18it/s]\r 73%|███████▎ | 73/100 [00:16<00:02, 11.84it/s]\r 75%|███████▌ | 75/100 [00:16<00:01, 12.95it/s]\r 77%|███████▋ | 77/100 [00:17<00:01, 13.51it/s]\r 81%|████████ | 81/100 [00:17<00:01, 18.47it/s]\r 85%|████████▌ | 85/100 [00:17<00:00, 19.48it/s]\r 90%|█████████ | 90/100 [00:17<00:00, 25.22it/s]\r 93%|█████████▎| 93/100 [00:17<00:00, 23.76it/s]\r 96%|█████████▌| 96/100 [00:17<00:00, 23.04it/s]\r 99%|█████████▉| 99/100 [00:17<00:00, 20.51it/s]\r100%|██████████| 100/100 [00:17<00:00, 5.59it/s]"
[89] ""
[90] "Parallel processing done"
[91] "DONE: Transforming each example..."
[92] "(freq) number of missing entries :\t 0.0 out of 10000x52x0=0 total"
[93] "(freq) number of imputed entries :\t 0.0"
[94] "(freq) number of not imputed entries :\t 0.0"
[95] "(non-freq) number of missing entries :\t 17978634 out of 10000x52x36=18720000 total"
[96] ""
[97] "(N x L x ^D) table :\t (10000, 52, 36)"
[98] "Time elapsed: 42.245755 seconds"
[99] "Discretizing features..."
[100] ""
[101] "Discretizing categorical features..."
[102] "\r 0%| | 0/36 [00:00<?, ?it/s]\r 3%|▎ | 1/36 [00:08<05:04, 8.71s/it]\r 6%|▌ | 2/36 [00:08<02:07, 3.74s/it]\r 11%|█ | 4/36 [00:09<00:47, 1.48s/it]\r 17%|█▋ | 6/36 [00:09<00:25, 1.18it/s]\r 22%|██▏ | 8/36 [00:09<00:15, 1.86it/s]\r 25%|██▌ | 9/36 [00:09<00:12, 2.24it/s]\r 31%|███ | 11/36 [00:09<00:07, 3.40it/s]\r 36%|███▌ | 13/36 [00:10<00:05, 4.45it/s]\r 42%|████▏ | 15/36 [00:10<00:03, 5.53it/s]\r 50%|█████ | 18/36 [00:10<00:02, 7.88it/s]\r 56%|█████▌ | 20/36 [00:10<00:01, 9.37it/s]\r 67%|██████▋ | 24/36 [00:10<00:00, 12.89it/s]\r 72%|███████▏ | 26/36 [00:10<00:00, 13.73it/s]\r 86%|████████▌ | 31/36 [00:11<00:00, 18.83it/s]\r100%|██████████| 36/36 [00:17<00:00, 1.75it/s]\r100%|██████████| 36/36 [00:17<00:00, 2.02it/s]"
[103] "Finished discretizing features"
[104] ""
[105] "Output"
[106] "X_all: shape=(10000, 52, 2619), density=0.013"
[107] "Time elapsed: 86.018476 seconds"
[108] ""
[109] "--------------------------------------------------------------------------------"
[110] "3-B) Post-filter time-dependent data"
[111] "--------------------------------------------------------------------------------"
[112] "(10000, 52, 2619) 0.013367035274767234"
[113] "Original : 2619"
[114] "Nearly-constant: 2579"
[115] "*** time: 22.88884687423706"
[116] "Correlated : 0"
[117] "*** time: 42.05948328971863"
[118] ""
[119] "Output"
[120] "X: shape=(10000, 52, 40), density=0.850"
[121] "(10000, 52, 40) 0.8497118269230769"
[122] "Time elapsed: 128.077959 seconds"
[123] ""
[124] "Output"
[125] "X: shape=(10000, 52, 40), density=0.850"
[126] "Total time: 137.240449 seconds"
[127] "" [1] "Input:"
[2] " Data : C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/ehrBOTreformat.csv"
[3] " Population: C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/patients.csv"
[4] " Config : C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/config-default.yaml"
[5] ""
[6] "Output directory: C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/fiddle_output_ffill"
[7] ""
[8] "Input arguments:"
[9] " T = 52.0"
[10] " dt = 1.0"
[11] " theta_1 = 0.001"
[12] " theta_2 = 0.001"
[13] " theta_freq = 1.0"
[14] " k = 1 ['mean']"
[15] " impute_method = ffill"
[16] ""
[17] "discretize = no"
[18] ""
[19] "N = 10000"
[20] "L = 52"
[21] ""
[22] ""
[23] "================================================================================"
[24] "1) Pre-filter"
[25] "================================================================================"
[26] "Remove rows not in population"
[27] "Remove rows with t outside of [0, 52]"
[28] "Remove rare variables (<= 0.001)"
[29] "Total variables : 42"
[30] "Rare variables : 0"
[31] "Remaining variables : 42"
[32] "# rows (original) : 3129266"
[33] "# rows (filtered) : 3129266"
[34] ""
[35] "================================================================================"
[36] "2) Transform; 3) Post-filter"
[37] "================================================================================"
[38] ""
[39] "--------------------------------------------------------------------------------"
[40] "*) Detecting and parsing value types"
[41] "--------------------------------------------------------------------------------"
[42] "Parsing hierarchical values"
[43] "Saved as: C:/Users/mthro/Desktop/phs-EHRanalysis/final-paper/fiddle_output_ffillvalue_types.csv"
[44] ""
[45] "--------------------------------------------------------------------------------"
[46] "*) Separate time-invariant and time-dependent"
[47] "--------------------------------------------------------------------------------"
[48] "Variables (time-invariant): 6"
[49] "Variables (time-dependent): 36"
[50] "# rows (time-invariant): 60000"
[51] "# rows (time-dependent): 3069266"
[52] ""
[53] "--------------------------------------------------------------------------------"
[54] "2-A) Transform time-invariant data"
[55] "--------------------------------------------------------------------------------"
[56] "(N x ^d) table :\t (10000, 6)"
[57] "number of missing entries :\t 0 out of 60000 total"
[58] "Time elapsed: 0.034801 seconds"
[59] ""
[60] "Output"
[61] "S_all, binary features :\t (10000, 8103)"
[62] "Time elapsed: 1.310230 seconds"
[63] ""
[64] "--------------------------------------------------------------------------------"
[65] "3-A) Post-filter time-invariant data"
[66] "--------------------------------------------------------------------------------"
[67] "Original : 8103"
[68] "Nearly-constant: 8086"
[69] "Correlated : 1"
[70] "Time elapsed: 1.310230 seconds"
[71] ""
[72] "Output"
[73] "S: shape=(10000, 16), density=0.282"
[74] "Total time: 1.325858 seconds"
[75] ""
[76] ""
[77] "--------------------------------------------------------------------------------"
[78] "2-B) Transform time-dependent data"
[79] "--------------------------------------------------------------------------------"
[80] "Total variables : 36"
[81] "Frequent variables : []"
[82] "M1 = 0"
[83] "M2 = 36"
[84] "k = 1 ['mean']"
[85] ""
[86] "Transforming each example..."
[87] "Batches of size 100: 100"
[88] "\r 0%| | 0/100 [00:00<?, ?it/s]\r 1%| | 1/100 [00:09<15:00, 9.10s/it]\r 2%|▏ | 2/100 [00:09<06:22, 3.91s/it]\r 3%|▎ | 3/100 [00:09<03:50, 2.38s/it]\r 4%|▍ | 4/100 [00:10<02:28, 1.54s/it]\r 5%|▌ | 5/100 [00:10<01:53, 1.20s/it]\r 6%|▌ | 6/100 [00:11<01:26, 1.09it/s]\r 7%|▋ | 7/100 [00:11<01:06, 1.39it/s]\r 8%|▊ | 8/100 [00:11<00:50, 1.84it/s]\r 10%|█ | 10/100 [00:11<00:28, 3.18it/s]\r 12%|█▏ | 12/100 [00:11<00:19, 4.45it/s]\r 13%|█▎ | 13/100 [00:12<00:18, 4.63it/s]\r 15%|█▌ | 15/100 [00:12<00:13, 6.30it/s]\r 16%|█▌ | 16/100 [00:12<00:20, 4.13it/s]\r 18%|█▊ | 18/100 [00:12<00:14, 5.55it/s]\r 20%|██ | 20/100 [00:13<00:12, 6.59it/s]\r 23%|██▎ | 23/100 [00:13<00:09, 7.83it/s]\r 25%|██▌ | 25/100 [00:13<00:08, 9.11it/s]\r 27%|██▋ | 27/100 [00:13<00:07, 9.94it/s]\r 30%|███ | 30/100 [00:13<00:05, 12.89it/s]\r 33%|███▎ | 33/100 [00:13<00:04, 16.04it/s]\r 37%|███▋ | 37/100 [00:14<00:04, 14.64it/s]\r 41%|████ | 41/100 [00:14<00:03, 17.53it/s]\r 44%|████▍ | 44/100 [00:14<00:03, 15.30it/s]\r 47%|████▋ | 47/100 [00:14<00:03, 16.98it/s]\r 49%|████▉ | 49/100 [00:15<00:04, 11.81it/s]\r 51%|█████ | 51/100 [00:15<00:03, 12.58it/s]\r 54%|█████▍ | 54/100 [00:15<00:03, 14.11it/s]\r 57%|█████▋ | 57/100 [00:15<00:02, 16.77it/s]\r 59%|█████▉ | 59/100 [00:15<00:02, 16.35it/s]\r 62%|██████▏ | 62/100 [00:15<00:02, 18.89it/s]\r 65%|██████▌ | 65/100 [00:15<00:01, 18.61it/s]\r 68%|██████▊ | 68/100 [00:16<00:01, 16.53it/s]\r 70%|███████ | 70/100 [00:16<00:02, 13.18it/s]\r 73%|███████▎ | 73/100 [00:16<00:02, 11.84it/s]\r 75%|███████▌ | 75/100 [00:16<00:01, 12.95it/s]\r 77%|███████▋ | 77/100 [00:17<00:01, 13.51it/s]\r 81%|████████ | 81/100 [00:17<00:01, 18.47it/s]\r 85%|████████▌ | 85/100 [00:17<00:00, 19.48it/s]\r 90%|█████████ | 90/100 [00:17<00:00, 25.22it/s]\r 93%|█████████▎| 93/100 [00:17<00:00, 23.76it/s]\r 96%|█████████▌| 96/100 [00:17<00:00, 23.04it/s]\r 99%|█████████▉| 99/100 [00:17<00:00, 20.51it/s]\r100%|██████████| 100/100 [00:17<00:00, 5.59it/s]"
[89] ""
[90] "Parallel processing done"
[91] "DONE: Transforming each example..."
[92] "(freq) number of missing entries :\t 0.0 out of 10000x52x0=0 total"
[93] "(freq) number of imputed entries :\t 0.0"
[94] "(freq) number of not imputed entries :\t 0.0"
[95] "(non-freq) number of missing entries :\t 17978634 out of 10000x52x36=18720000 total"
[96] ""
[97] "(N x L x ^D) table :\t (10000, 52, 36)"
[98] "Time elapsed: 42.245755 seconds"
[99] "Discretizing features..."
[100] ""
[101] "Discretizing categorical features..."
[102] "\r 0%| | 0/36 [00:00<?, ?it/s]\r 3%|▎ | 1/36 [00:08<05:04, 8.71s/it]\r 6%|▌ | 2/36 [00:08<02:07, 3.74s/it]\r 11%|█ | 4/36 [00:09<00:47, 1.48s/it]\r 17%|█▋ | 6/36 [00:09<00:25, 1.18it/s]\r 22%|██▏ | 8/36 [00:09<00:15, 1.86it/s]\r 25%|██▌ | 9/36 [00:09<00:12, 2.24it/s]\r 31%|███ | 11/36 [00:09<00:07, 3.40it/s]\r 36%|███▌ | 13/36 [00:10<00:05, 4.45it/s]\r 42%|████▏ | 15/36 [00:10<00:03, 5.53it/s]\r 50%|█████ | 18/36 [00:10<00:02, 7.88it/s]\r 56%|█████▌ | 20/36 [00:10<00:01, 9.37it/s]\r 67%|██████▋ | 24/36 [00:10<00:00, 12.89it/s]\r 72%|███████▏ | 26/36 [00:10<00:00, 13.73it/s]\r 86%|████████▌ | 31/36 [00:11<00:00, 18.83it/s]\r100%|██████████| 36/36 [00:17<00:00, 1.75it/s]\r100%|██████████| 36/36 [00:17<00:00, 2.02it/s]"
[103] "Finished discretizing features"
[104] ""
[105] "Output"
[106] "X_all: shape=(10000, 52, 2619), density=0.013"
[107] "Time elapsed: 86.018476 seconds"
[108] ""
[109] "--------------------------------------------------------------------------------"
[110] "3-B) Post-filter time-dependent data"
[111] "--------------------------------------------------------------------------------"
[112] "(10000, 52, 2619) 0.013367035274767234"
[113] "Original : 2619"
[114] "Nearly-constant: 2579"
[115] "*** time: 22.88884687423706"
[116] "Correlated : 0"
[117] "*** time: 42.05948328971863"
[118] ""
[119] "Output"
[120] "X: shape=(10000, 52, 40), density=0.850"
[121] "(10000, 52, 40) 0.8497118269230769"
[122] "Time elapsed: 128.077959 seconds"
[123] ""
[124] "Output"
[125] "X: shape=(10000, 52, 40), density=0.850"
[126] "Total time: 137.240449 seconds"
[127] ""